Research compendia as R packages
Scientific workflows: Tools and Tips 🛠️
7/20/23
What is this lecture series?
📅 Every 3rd Thursday 🕓 4-5 p.m. 📍 Webex
- One topic from the world of scientific workflows
- For topic suggestions send me an email
- If you don’t want to miss a lecture
- Slides provided on Github
Steps of a scientific project
Steps of a scientific project
How to properly structure the project?
I want
- 🔃 Reproduciblity (for you and others)
- 🏋 Reliablity (will it work again?)
- ⚙ Re-usablity (don’t re-invent the wheel)
- 🔍 Visiblity (let others see and use your work)
How? Use a research compendium!
What is a research compendium?
- Collection of all digital parts of a research project (data + code + text)
The goal of a research compendium is to provide a standard and easily recognizable way for organizing the digital materials of a project to enable others to inspect, reproduce, and extend the research.
From Marwick et al. 2018
Principles for building research compendia
- Stick with the conventions in your field
- Keep data, methods and output separate
- Original data read only, output considered disposable
- Specify the computational environment
- Key components for sharing the compendium include
- Licence
- Version control
- Metadata
- Persistent identifier (e.g DOI)
Examples of different complexities
Small compendium
![]()
Medium size compendium
![]()
They are R packages!
R packages as research compendia
![]()
Basic idea: Hijack the R package development ecosystem to build a research compendium
Different use cases, e.g.
- Publish code, data and analysis scripts alongside your paper
- Publish a dataset in a way that other people can work with it in R
Some benefits of R packages
- Benefit from quality control mechanisms built around R packages
- Additional packages around this ecosystem to make your life easier
- Easy documentation
- Easy sharing of data
Hands-on: Create a research compendium with the R package structure
Find a detailed step-by-step guide on the website
Summary and Conclusions
It’s convenient to have standards you can follow
- R packages provide an helpful development ecosystem that we can hijack for our research compendia
- You can develop your compendium in different ways
- Purpose is for people to install the package in the end
- Purpose is just to use the quality checks from R Studio
usethis is a great workflow package that allows us to seamlessly follow this workshop
Outlook
The nice thing:
- Also this easy to set up with
usethis and friends
- They are also documented on the lecture series website
Next lecture
Summer/Conference break in August and September!
Time for some feedback from you!
Please fill out the questionnaire if you have time (5 mins)
Thank you for your attention :)
Questions?